Gmail Garry Morrison <garry.morrison@gmail.com>

teaser ...
4 messages

Garry Morrison <garry.morrison@gmail.com> Fri, Aug 9, 2013 at 8:53 PM
To: Hiroaki Tani <hiro.tani@gmail.com>
Hey Hiro,

I think I made a major break through in general knowledge representation!
Either that or I'm crazy :)

It still has some sharp edges, but already can represent an awful lot of stuff.
Details when I get time.

Hint: It is a form of bra-ket + operator notation, plus a couple of additions of my own.
http://en.wikipedia.org/wiki/Bra-ket
(actually, it follows from last time when I was talking about superposition of knowledge. I asked myself if there was a corresponding meaning for the bra component, and then everything quite quickly followed from there).

I guess you could say my scheme is a general way to represent (relatively) static knowledge.
And bra-ket+operators in QM is sort of a subset of this, and represents what we can know about the physics of the universe. QM is a harsh mistress and strongly limits what we can know. So maybe it is not surprising that the same notation can be extended to represent what we know about more general objects. (note, I say "QM notation" not "QM mathematics". There is no equivalent for the Schrodinger equation for example).

I guess ideally I would tack this on to Bort.
There are some Bort ideas that would fit neatly with this new scheme.


Seeya,
-ds.



Hiro <hiro.tani@gmail.com> Sat, Aug 10, 2013 at 2:56 AM
To: Garry Morrison <garry.morrison@gmail.com>
Okaaay. This is waaay out of my comfort zone.
Whatever you are working on, remember you need to demonstrate practicality, novelty and usefulness of it at some stage (even theoretically - think of a provisional patent).
If you are working on QM based ideas, it's most likely novel, but I will have no idea if it's practical or useful.
Do you need a QM computer to make your algorithms work? If it did, it's still great but it'll be way over my head.

The biggest problem with IBM's approach is the sheer brute force method they employ with their algorithms.
If you are working on ways to optimize this, then it may be very useful.

Hiro
[Quoted text hidden]

Garry Morrison <garry.morrison@gmail.com> Sat, Aug 10, 2013 at 10:18 PM
To: Hiro <hiro.tani@gmail.com>
Hey Hiro,

First, you will understand it. It is really just a notational simplification of what I have already explained to you.
And it doesn't need a QM computer. Nothing even close to that elaborate or sophisticated.
I'll try and explain it and then you can comment on it.
Like I said, maybe it is trivial and boring? Certainly does have strong similarities with what the cyc project has already done.
(though I understand cyc restricts itself to boolean ie yes/no. This scheme represents degrees of knowledge quite neatly).

OK.
Recall GUID's? For each concept, for each neuron, for each synapse there is a GUID.
And at any given time we say it is firing with a value of v.
So all I'm doing is borrowing the bra-ket notation as labels for GUID's.
So, "Fred" firing with value 7 becomes simply:
7|Fred>

So what then is the meaning of <Fred| ?
It is the measuring of the value of the "Fred" neuron/synapse.
If we could do it physically, it would be applying a probe to the Fred neuron and measuring its value.
Note, a maths point (the orthogonality of bra-kets): if X and Y are GUID's <X|Y> == 0 unless X == Y, and == 1 when X == Y.
If X or Y are a compound object that is not a single GUID then this rule does not apply.


See, nothing as complex as QM here. Not even close.
Now from here, lets apply this a little.

First, a couple of definitions:
I call:
a|A> + b|B> + c|C> + d|D>
a superposition (ie, A,B,C,D are all firing at the same time). (NB, the plus sign).
Note, in a superposition you can change the order and it means essentially the same thing (as long as you are careful in a couple of spots)
I call:
u|U> . v|V> . w|W>
a sequence. Ie, there is an implicit time sequence/time delay between each element. (NB the dot).
The exact value of the time delay is not normally important, as long as it is greater than 0.
Though when it comes to music, yeah, it is vitally important! (eg, keeping time, or out of time).

Note, for a sequence the order is significant. If |B> time-wise comes before |Q> you can't then swap the order.
The idea of causality also comes in neatly around here.
eg, if |U> comes before |Y> it is possible |U> caused |Y>.
If the order is reversed, then it is not possible |U> caused |Y>  (unless time-travel is possible).


Then we can also have mixed.
eg:
a|A> + b|B> . p|P> . q|Q> . r|R> + y|Y> + z|Z>
and . has higher precedence than + so this is the same as:
a|A> + (b|B> . p|P> . q|Q> . r|R>) + y|Y> + z|Z>

Note, QM has the concept of superposition, but as far as I recall the sequence idea is not used in QM.

While I am thinking of it, I should mention a single object, in this case |X> can have a time sequence too:
eg:
a|X> . b|X> . c|X> . d|X> . e|X> . f|X> ...


For a start, we can now use this notation to represent an event log.
Recall, something like this:
(t1,G1,v1)
(t2,G2,v2)
(t3,G3,v3)
(t3,G4,v4)
(t3,G5,v5)
(t3,G6,v6)
(t4,G7,v7)
(t5,G8,v8)

maps to this:
v1|G1> . v2|G2> . (v3|G3> + v4|G4> + v5|G5> + v6|G6>) . v7|G7> . v8|G8>
Note, in this case we don't really need to know the exact values of ti, just the ordering.


Next, all those "relations" I was talking about previously. In this model they are operators acting on kets.
We use => to represent learning a new relation, and = to represent what we already know.
So on to the fun bit:

Say we have a child and mum only buys red apples.
We have:
colour|apple> => |red>
colour|orange> => |colour orange>
colour|lemon> => |yellow>
Where "colour" here is both a relation, and the equivalent of an operator in QM.
(recall last time I did this using a matrix to represent the values. Well, essentially all I'm doing is using bra-kets instead of typing up matrices. It is the same identical information, just a different representation. Though a representation that is much easier to type!)

What if Billy then learns the existence of green apples?
colour|apple> => 0.8 |red> + 0.2|green>

Then if Mum asks what colour are apples, Billy's brain does:
colour|apple> = 0.8 |red> + 0.2|green> 
Note the equals instead of the arrow.
I tend to swap between them because they mean either in the process of learning, or already know.
For this document the difference in meaning is minimal.


Then later in life, we might have things like:
dialect |colour> = |UK English>
dialect |color> = |US English>
dialect |humour> = |UK English>
dialect |humor> = |US English>


Now, let's build some relations around the actor Hugh Laurie:
wikipedia-page |Hugh Laurie> = |http://en.wikipedia.org/wiki/Hugh_Laurie>
imdb-page |Hugh Laurie> = |http://www.imdb.com/name/nm0491402/>
best-known-for| Hugh Laurie> = 0.9|House tv> + 0.1 |blackadder-tv>

Now, we can apply more than one relation at a time.
eg: (applying the imdb-page relation to the best-known-for relation applied to |Hugh Laurie> )
imdb-page:best-known-for| Hugh Laurie> = 0.9:imdb-page|House tv> + 0.1:imdb-page |blackadder-tv>
But we also know:
wikipedia-page:|House TV> = |http://en.wikipedia.org/wiki/House_(TV_series)>
wikipeida-page:|blackadder-tv> = |http://en.wikipedia.org/wiki/Blackadder>


How about some arithmetic?
arithmetic:(|3> . |+> . |5>) = |8>
arithmetic:(|7> . |*> . |7>) = |49>
Note this relation acted on a sequence, not just a single GUID/ket.

Also note on the right it is |8> and |49> not 8 and 49.
This is important. We are talking the concept of 8 and 49 here, not the values.
eg, we can then do things like:
factors|8> = |2> . |*> . |2> . |*> . |2>
factors[49> = |7> . |*> . |7>
|8> = |2> . |^> . |3>


Next, we can make use of if a GUID/ket has coefficient 0, then we can drop it from the expression (though it does make the NOT operator tricky).
For example:
|a-list> => |137> + |chair> + |frog> + |table> + |5573>
Now, the is-number, is-furniture, is-animal operators:
(they return 1 for yes, and 0 for no).
is-number|a-list> = is-number|137> + is-number|chair> + is-number|frog> + is-number|table> + is-number|5573>
                         = |137> + 0|chair> + 0|frog> + 0|table> + |5573>
                         = |137> + |5573>

is-furniture|a-list> = is-furniture|137> + is-furniture|chair> + is-furniture|frog> + is-furniture|table> + is-furniture|5573>
                          = 0|137> + |chair> + 0|frog> + |table> + 0|5573>
                          = |chair> + |table>

is-animal|a-list> = is-animal|137> + is-animal|chair> + is-animal|frog> + is-animal|table> + is-animal|5573>
                        = 0|137> + 0|chair> + |frog> + 0|table> + 0|5573>
                        = |frog>

I guess I could mention the "number-has-value" operator here too:
number-has-value:is-number|a-list> = number-has-value(|137> + |5573>)
                                                   = 137|137> + 5573|5573>

And we can use superpositions to represent incomplete knowledge.
eg:
is-married|Fred> = 0.5 |yes> + 0.5 |no>
German|the> = 0.33|der> + 0.33|die> + 0.33|das>
German|you> = 0.7|Sie> + 0.3|du>
As we acquire more knowledge we can change the coefficients.

eg if we learn of Fred's recent marriage we do:
is-married|Fred> => |yes>  (NB the arrow here, not just an equal sign).

And this is the first hint of using this scheme for language translation.
Though a lot of work to get there from here.
Say we have the sequence:
|text> => |word1> . |word2> . |word3> . |word4>
Then we consider:
German|text> = German|word1> . German|word2> . German|word3> . German|word4>
But this is only a very crude translation.
As in "the" and "you" above, it is not a one-to-one mapping (there are 3 meanings of "the", and 2 of "you").
And further, the grammatical order of the language is probably different (depending on language pairs).
So to get from the crude translation to a proper translation would require a lot of further processing!



Hrmm... still haven't mentioned bra's yet.
Here is a nice example:
|list-of-Australian-cities> => |Adelaide> + |Perth> + |Canberra> + |Melbourne> + |Sydney>

population:|list-of-Australian-cities> = 1300000|Adelaide> + 1900000|Perth> + 370000|Canberra> + 4200000|Melbourne> + 4700000|Sydney>

Since bra-kets (for GUID's) are orthogonal <Adelaide|X> == 0 unless X == Adelaide.
So we can do:
"population of Adelaide"
maps to:
<Adelaide|population|list-of-Australian-cities>
  = 1300000
(and to someone that has studied bra-kets this is a pretty sight!)

"population of Melbourne"
maps to:
<Melbourne|population|list-of-Austalian-cities>
  = 4200000


Next, you can use sequences of kets to represent grammar.
Too lazy to dig deeply at the moment, but for a start:
plural|word> => |word> . |s>
plural|foot> => |feet>
plural|tooth> => |teeth>

(ie, we have the general case, and then the more specific cases over-rule them).
Indeed, talking of grammar. I'm pretty sure the grammar that defines valid Bort code can also be represented in this scheme.
In general a grammar is just a sequence of objects, where the objects are general categories.

Then things like:
spell|Garry> = |G> . |a> . |r> . |r> . |y>
spell|Hiro> = |H> . |i> . |r> . |o>

spell:plural|dog> = spell(|dog> . |s>) = |d> . |o> . |g> . |s>

full-name|Hiro> = |Hiroaki> . |Tani>
friends|Garry> = |Hiro> + |Gary> + |Suresh> + |Peter>
(or I could put coeffs to represent the strength of friendship).

I'm losing track of what I wanted to mention ...
How about this next.
Give your email I am replying to the label |most-recent-Hiro-email>.
Then we can do:
sent-by:|most-recent-Hiro-email> = |Hiro>
sent-to: |most-recent-Hiro-email> = |Garry>
time-sent:|most-recent-Hiro-email> = |2:56 AM>
body-of-post:|most-recent-Hiro-email> = |Okaaay.> .  |This> . |is> .  |waaay> . |out> . |of> . |my> . |comfort> . |zone.> . |new-line> ...

And of course you can do similar for twitter posts, etc.

Or the front page of slashdot:
|front-page> => |post1> . |post2> . |post3> . |post4> ....
title|post1> = |Ask> . |Slashdot:> . |How> . |Do> . |I> ...
posted-by|post1> = |timothy>
post-time|post1> = |Saturday August 10, 2013 @03:06AM>
from-the-dept|post1> = |use-all-caps-and-lots-of-imperatives>
summary|post1> = |First> . |time> . |accepted> . |submitter> ...
Now, how about some pattern recognition?
We use the simm, but the notation is this:
|h> . |i> . |r> . |o> => |hiro>
ie, if we see h,i,r,o in sequence the "hiro" GUID fires
(tangent: this is something I like to call a non-linear resonance.
When a very specific shape/pattern is detected suddenly something resonates.
In this case the |hiro> neuron to the h,i,r,o sequence.
But more generally think of the sound waveform that causes the |hiro> neuron to fire, or say the sound waveform that causes the |frog> neuron to fire.)

If instead it is a partial sequence, we use simm and we get something like:
|h> .  |r> . |o> = 0.75 |hiro>   (NB, the missing |i>, so simm says only a 75% match).

Another tangent is my idea of "order" (something I have mentioned a few times in the past).
Anything that is on the right hand side of a "=>" has higher order than something on the left.
eg, |hiro> is higher order than |h>, or |i> or etc.

Another example:
|3> . |.> . |1> . |4> . |1> . |5> . |9> => |pi>
So |pi> has higher order than the individual numbers on the left.

More generally, we can also have:
a|A> . b|B> . c|C> => |object>   (NB, the coeffs with values other than just 1)
then simm works out how close the sequence is, and makes that the coeff of |object>
ie, 1 for exact/perfect match, and 0 for completely different.

=====
I've been liberal with details, google snooping on my work be damned, here is simm:
(for the case the values are >= 0).
simm[w,f,g] = Sum[w[i]*min[f[i],g[i]]; for i in 1..|f| ]
(also, this is the non normalized version - the normalization is 2*max[w*f,w*g] )
=====

And it is not necessarily a sequence. A super-position (if order is irrelevant) or mixed is fine too.
eg this example of the mixed case:
d|D> + k|K> . j|J> . p|P> + u|U> + v|V> + x|X> . y|Y> . z|Z> => |another-object>

I haven't explained this section well, but it is the most powerful part.
You tell the system what patterns to expect and then when found it activates the corresponding GUID, with the appropriate size coefficient.

Indeed, you can even use it for sequence prediction.
eg:
a|A> . b|B> . c|C> . d|D> => e|E> . f|F>

Which itself ties in closely to the ngram-stitch thing I have been playing with.
In general terms you have a BIG file/list of:
|word1> . |word2> . |word3> . |word4> . |word5>
(at least in the 3/5 version - 3 word overlap using 5-grams)
Say we have the starting sequence:
|start> => |a> . |b> . |c> . |d> . |e>
then we select the last 3 elements, which in bort is:
select[(3:..1:)]:|start>
(note the use of colon numbers - they index from the reverse end of a list. 1: is the end, 2: is the 2nd from end, and so on).
which maps to:
|c> . |d> . |e>
then we have a working file that is a list of all the lines in our 5-gram file such that they start with |c> . |d> . |e>
Then we randomly pick one of these.
eg: |c> . |d> . |e> . |p> . |q>
then select the last 2 elements of this, which in bort is:
select[(2:..1:)]:(|c> . |d> . |e> . |p> . |q>)
= |p> . |q>
then tack this onto |start>
ie:
|start> . |p> . |q>
Then iterate.
In English: you have a sequence and then you use sequence prediction to work out what should follow next, then iterate.



OK. So the above mostly focused on operators that act on kets.
It is also useful to have functions that act on the coefficients of the kets (as I recall, I don't think QM has this).
eg:
threshold-filter[x] ::= return 0 if[x < 10]else x;
|Y> => 3|a> + 13|b> + 2|c> + 530|d>
Then we have:
threshold-filter:|Y> = 0|a> + 13|b> + 0|c> + 530|d>
  = 13|b> + 530|d>

By itself this is probably quite boring, but is actually a very powerful idea!
Another one is:
binary-filter[x] ::= return 0 if[x < 1]else 1;
So:
binary-filter:(0.5|a> + 3|b> + 0.2|c> + 21|d> + 000.1|e>) = |b> + |d>
Indeed, these things are called sigmoids in the previous emails I have sent you.


What's next?
How about the idea of "six degrees of separation"?
Say we have the people-you-know operator.
Let's abbreviate that to, PYK.
eg:
PYK|Alice> = |Alice> + |Bob> + |Carl>    NB, |Alice> is on the right since presumably Alice knows Alice!
PYK|Bob> = |Bob> + |David> + |Eve>
PYK|Carl> = |Carl> + |Fred> + |George> + |Harry>

Then we can do:
PYK:PYK:|Alice> = PYK( |Alice> + |Bob> + |Carl>)
  = |Alice> + |Bob> + |Carl> + PYK|Bob> + PYK|Carl>
  = |Alice> + |Bob> + |Carl> + |Bob> + |David> + |Eve> + |Carl> + |Fred> + |George> + |Harry>
  = |Alice> + 2|Bob> + 2|Carl> + |David> + |Eve> + |Fred> + |George> + |Harry>

Then if the "six degrees of separation" idea is true, ie, for any person there is at most 6 steps from any other person, we would have:
PYK:PYK:PYK:PYK:PYK:PYK:|person> == |list-of-all-people-on-Earth>

Or we could write it as:
<person2|PYK:PYK:PYK:PYK:PYK:PYK:|person1> == 1
for any two people "person1" and "person2".
which we can abbreviate to:
<person2|PYK:^6|person1> == 1

I personally think it is false, but it might be close. Maybe it is 13 or something??

And we have things like George is 2 degrees of separation from Alice:
<George|PYK:^2|Alice> == 1.


Anyway, this is just a general way to represent networks.
eg we could do something similar with url's.
eg:
links-to|url1> = |url2> + |url3> + |url4>
links-to|url2> = |url5> + |url6> + ...
and so on.


Anyway, summary time.
I guess in this scheme AI reduces to the problem of defining the set of all relations and operators applied to GUID's.
I guess initially you could construct these by hand, but eventually you want them to self generate by feeding in data from the web.
Context becomes rules that apply to a subset of knowledge. ie, in different "context" you have different links between GUID's.

Related projects:
cyc seems to be doing something very similar but concerned with binary/boolean truths. ie, yes/no. My scheme represents degrees of knowledge. That doesn't of course mean we couldn't slurp in their data into our model :)
Ditto the "semantic web".
IBM seems to be brute forcing neural networks. Creating vast linkages of neurons without any meaning/semantics applied to such networks. They are throwing too much money at this for them to not find some interesting results!
Relational databases: they seem capable of representing a great deal of human knowledge, so there is most likely a tie in here between this scheme and theirs.


Now, the tie in to Bort:
first is the sort operator (eventually we could encode this in the scheme above, but for now just assume its existence).
eg:
sort(3|a> + 37|b> + 51|c> + 2|d>)
maps to:
51|c> + 37|b> + 3|a> + 2|d>

the normalize operator (re-weight the coeffs so that the sum == 1).
eg:
normalize(a|A> + b|B> + c|C> + d|D>)
maps to:
sum = a + b + c + d
a/sum|A> + b/sum|B> + c/sum|C> + d/sum|D>

the select operator
eg:
select[k]:object
maps to:
return the k'th element.
eg:
select[3]:(|a> + |b> + |c> + |d>)
maps to:
|c>
select[(2,4)]:(|a> + |b> + |c> + |d>)
maps to:
|b> + |d>
something similar for sequences.
Slightly more complicated for the mixed case.

the number-of-elements operator |X|.
eg:
||a> + |b>| = 2
||x>| = 1
||a> + |b> + |c> + |d> + |e> + |f>| = 6.

pick-elt:(|a> + |b> + |c> + |d> + |e> + |f>)
randomly returns one of the elements.

pick[n]:(|a> + |b> + |c> + |d> + |e> + |f>)
randomly returns n of the elements.

The mixed case of superpositions + sequences is more complex, but similar.


Finally, there is a thing that is very common in QM called a projection operator |X><X|.
Making use of that is for another time.
Besides, I haven't fully fleshed out how projection operators work when you mix superpositions and sequences.


That is more than enough for today.
I think it is time to cook tea, and watch Mad Max.
OK. I had a read over, and I suspect it will take some work for you to follow.
Well, a) it took a LONG time to write, and b) the document organization could do with some work! :)


Seeya,
-ds.

[Quoted text hidden]

Hiro <hiro.tani@gmail.com> Sat, Aug 24, 2013 at 5:35 AM
To: Garry Morrison <garry.morrison@gmail.com>
Alright, sorry for the delay in reply.
Holy crap this is a long email!!
You wrote a ton of info here, but very carefully explained so I do get the concept now. 
It's way cleaner and easier for humans to understand than using matrices. 
Since humans need to build the database, it probably will help to make it legible especially when things start to get complicated. (Hey I didn't know you knew some German btw)

So here's my naiive question. To me these notations make it a higher order language rather than the assembly code.  Don't you still have to figure out how these data are stored and recalled efficiently (enough to map the web and knowledge of mankind)? Could the bra-ket notations make it less efficient under the hood, or is that irrelevant because that's a separate issue. 

Hiro
[Quoted text hidden]